Efficient and Effective Analysis of Data Quality using Pattern Tableaux

نویسندگان

  • Lukasz Golab
  • Flip Korn
  • Divesh Srivastava
چکیده

Data Auditor is a system for analyzing data quality via exploring data semantics. Given a user-supplied constraint, such as a functional dependency or an inclusion dependency, the system computes pattern tableaux, which are concise summaries of subsets of the data that satisfy (or fail) the constraint. The engine of Data Auditor is an efficient algorithm for finding these patterns, which defers expensive computation on patterns until needed during search, thereby pruning wasted effort. We demonstrate the utility of our approach on a variety of data as well as the performance gain from employing this algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Auditor: Exploring Data Quality and Semantics using Pattern Tableaux

We present Data Auditor, a tool for exploring data quality and data semantics. Given a rule or an integrity constraint and a target relation, Data Auditor computes pattern tableaux, which concisely summarize subsets of the relation that (mostly) satisfy or (mostly) fail the constraint. This paper describes 1) the architecture and user interface of Data Auditor, 2) the supported constraints for ...

متن کامل

Discovering Pattern Tableaux for Data Quality Analysis: a Case Study

In this paper, we present a case study that illustrates the utility of pattern tableau discovery for data quality analysis. Given a usersupplied integrity constraint, such as a boolean predicate expected to be satisfied by every tuple, a functional dependency, or an inclusion dependency, a pattern tableau is a concise summary of subsets of the data that satisfy or fail the constraint. We descri...

متن کامل

Selecting Energy Efficient Poultry Egg Producers: A Fuzzy Data Envelopment Analysis Approach

This study examined the energy use pattern of poultry for egg production farms of Iran and ranked the selected farmers using fuzzy data envelopment analysis (FDEA) from the viewpoint of energy efficiency. Since data used in our study were not measured precisely, fuzzy forms of them could help us to reach the ideal situations. Hence, the conventional data envelopment analysis (DEA) was remod...

متن کامل

Designing an Optimal Pattern of General Medical Course Curriculum: an Effective Step in Enhancing How to Learn

Introduction: In today's world with a vast amount of information and knowledge, medical students should learn how to become effective physicians. Therefore, the competencies required for lifelong learning in the curriculum must be considered. The purpose of this study was to present a desirable general medical curriculum with emphasis on lifelong learning. Methods: The present study was Mixe...

متن کامل

A General Approach to Mining Quality Pattern-Based Clusters from Microarray Data

Pattern-based clustering has broad applications in microarray data analysis, customer segmentation, e-business data analysis, etc. However, pattern-based clustering often returns a large number of highlyoverlapping clusters, which makes it hard for users to identify interesting patterns from the mining results. Moreover, there lacks of a general model for pattern-based clustering. Different kin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Data Eng. Bull.

دوره 34  شماره 

صفحات  -

تاریخ انتشار 2011